Skip N-grams and Ranking Functions for Predicting Script Events
نویسندگان
چکیده
In this paper, we extend current state-of-theart research on unsupervised acquisition of scripts, that is, stereotypical and frequently observed sequences of events. We design, evaluate and compare different methods for constructing models for script event prediction: given a partial chain of events in a script, predict other events that are likely to belong to the script. Our work aims to answer key questions about how best to (1) identify representative event chains from a source text, (2) gather statistics from the event chains, and (3) choose ranking functions for predicting new script events. We make several contributions, introducing skip-grams for collecting event statistics, designing improved methods for ranking event predictions, defining a more reliable evaluation metric for measuring predictiveness, and providing a systematic analysis of the various event prediction models.
منابع مشابه
Modeling Harmony with Skip-Grams
String-based (or viewpoint) models of tonal harmony often struggle with data sparsity in pattern discovery and prediction tasks, particularly when modeling composite events like triads and seventh chords, since the number of distinct n-note combinations in polyphonic textures is potentially enormous. To address this problem, this study examines the efficacy of skip-grams in music research, an a...
متن کاملLearning Linguistic Biomarkers for Predicting Mild Cognitive Impairment using Compound Skip-grams
Predicting Mild Cognitive Impairment (MCI) is currently a challenge as existing diagnostic criteria rely on neuropsychological examinations. Automated Machine Learning (ML) models that are trained on verbal utterances of MCI patients can aid diagnosis. Using a combination of skip-gram features, our model learned several linguistic biomarkers to distinguish between 19 patients with MCI and 19 he...
متن کاملA Closer Look at Skip-gram Modelling
Data sparsity is a large problem in natural language processing that refers to the fact that language is a system of rare events, so varied and complex, that even using an extremely large corpus, we can never accurately model all possible strings of words. This paper examines the use of skip-grams (a technique where by n-grams are still stored to model language, but they allow for tokens to be ...
متن کاملEnsemble classifier for Twitter sentiment analysis
In this paper, we present a combination of different types of sentiment analysis approaches in order to improve the individual performance of them. These ones consist of (I) ranking algorithms for scoring sentiment features as bi-grams and skip-grams extracted from annotated corpora; (II) a polarity classifier based on a deep learning algorithm; and (III) a semi-supervised system founded on the...
متن کاملAutomatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics
In this paper we describe two new objective automatic evaluation methods for machine translation. The first method is based on longest common subsequence between a candidate translation and a set of reference translations. Longest common subsequence takes into account sentence level structure similarity naturally and identifies longest co-occurring insequence n-grams automatically. The second m...
متن کامل